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MONOTONIC PROPERTIES OF THE LEAST SQUARES MEAN 

JIMMIE LAWSON AND YONGDO LIM 



Abstract. We settle an open problem of several years standing by showing that 
the least-squares mean for positive definite matrices is monotone for the usual 



C*^ . (Loewner) order. Indeed we show this is a special case of its appropriate gener- 

CN ■ 

alization to partially ordered complete metric spaces of nonpositive curvature. Our 
techniques extend to establish other basic properties of the least squares mean such 
as continuity and joint concavity. Moreover, we introduce a weighted least squares 
means and extend our results to this setting. 



1. Introduction 

Not only does the study of positive definite matrices remain a flourishing area 
'i> ' of mathematical investigation (see e.g., the recent monograph of R. Bhatia [6] and 

Q^ ! references therein), but positive definite matrices have become fundamental compu- 

r^ ■ 

th- ■ tational objects in many areas of engineering, statistics, quantum information, and 

t~^ ■ applied mathematics. They appear as covariance matrices in statistics, as elements 

O ' 

Q I of the search space in convex and semidefinite programming, as kernels in machine 

learning, as density matrices in quantum information, and as diffusion tensors in 

, , medical imaging, to cite a few. A variety of metric-based computational algorithms 

^ ■ for positive definite matrices have arisen for approximations, interpolation, filtering. 



estimation, and averaging, the last being the concern of this paper. In recent years, 
it has been increasingly recognized that the Euclidean distance is often not the most 
suitable for the set of positive definite matrices-the positive symmetric cone P = Pm 
for some ?n-and that working with the proper geometry does matter in computa- 
tional problems. It is thus not surprising that there has been increasing interest in 
the trace metric, the distance metric arising from the natural Riemannian structure 
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on P making it a Rieniannian manifold, indeed a symmetric space, of negative curva- 
ture. (Recall the trace metric distance between two positive definite matrices is given 
by S{A,B) = (^.^-^log^ Aj(A"^_B))2, where Xi{X) denotes the i-th eigenvalue of X 
in non- decreasing order.) Recent contributions that have advocated the use of this 
metric in applications include fi3\ [26| 130] for tensor computing in medical imaging 
and [4j for radar processing. 

Since the pioneering paper of Kubo and Ando [T7j, an extensive theory of two- 
variable means has sprung up for positive matrices and operators, but the multivari- 
able case for n > 2 has remained problematic. Once one realizes, however, that the 
matrix geometric mean (52(^,5) = AjfB := A^/2(^A"^/2^A"^/2)i/2^i/2 jg ^^le metric 
midpoint of A and B for the trace metric (see, e.g., [HIIH]), it is natural to use an av- 
eraging technique over this metric to extend this mean to a larger number of variables. 
First M. Moakher [25] and then Bhatia and Holbrook [7], [8] suggested extending the 
geometric mean to n-points by taking the mean to be the unique minimizer of the 
sum of the squares of the distances: 

n 

&niAi, . . . , An) = argminV(52(X,Ai). 

This idea had been anticipated by Elie Cartan (see, for example, section 6.1.5 of [5]), 
who showed among other things such a unique minimizer exists if the points all lie in 
a convex ball in a Riemannian manifold, which is enough to deduce the existence of 
the least squares mean globally for P. 

Another approach, independent of metric notions, was suggested by Ando, Li, and 
Mathias [2] via a "symmetrization procedure" and induction. The Ando-Li-Mathias 
paper was also important for listing, and deriving for their mean, ten desirable prop- 
erties for extended geometric means f^ : P" — > P that one might anticipate from 
properties of the two-variable geometric mean, where P = Pm denotes the convex 
cone oi mx m positive definite Hermitian matrices equipped with the Loewner order 
<. For A = (Ai, . . . , A„), B = (5i, . . . , 5„) e P", or G S"" a permutation on n-letters, 
a = (ai, . . . , a„) G M"^ (^++ = (0, C)o)), these are 

(PI) (Consistency with scalars) g{A) = (Ai ■ ■ ■ AnY^"' if the Aj's commute; 
(P2) (Joint homogeneity) g{aiAi, ..., a„A„) = (ai ■ ■ ■ a„)^/"5f(A); 
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(P3) (Permutation invariance) g{A„) = g{A), where A^- = (Ao-(i), • • • , ^^^(^n))] 

(P4) (Monotonicity) If Bi < Ai for all 1 < ^ < n, then ^(B) < ^(A); 

(P5) (Continuity) g is continuous; 

(P6) (Congruence invariance) g{M*AM) = M*g{A)M for invertible invertible ma- 
trix M, where M{Ai, ..., An)M* = {MAiM*, ..., MAnM*); 

(P7) (Joint concavity) ^(AA + (1 - A)B) > A^(A) + (1 - A)^(B) for < A < 1; 

(P8) (Self-duality) g{Ai\ ..., A'^)-^ = g{A,, . . . , A„); 

(P9) (Determinental identity) Det^(A) = nr=i(DetA,)^/"; and 

(PIO) (AGH mean inequalities) n(^"^i Ar^)-! < ^(A) < ^ ELi A- 

We call a mean g of n-variables satisfying these properties a symmetric geom,etric 
mean, the adjective "symmetric" describing its invariance under permutations, prop- 
erty (P3). 

The Ando-Li-Mathias mean proved to be computationally cumbersome, and Bini, 
Meini, and Poloni |9] suggested an alternative with more rapid convergence properties, 
which also satisfied the ten axioms. One notes in particular that while the axioms 
characterize the two-variable case, this is no longer true in the n-variable case, n > 2. 

These ten properties may be generalized to the setting of weighted geometric means. 
We recall that the two- variable weighted geometric mean is given by 

t ^ &2{l -t,t- A, B) = A#tB :=: A^/^{A-^/^BA-'^yA^/\ 

which is a geodesic parametrization of the unique geodesic passing through A and 
B for A ^ B. A weighted geometric mean of n-positive definite matrices should be 
defined for each weight, where the weights u = {wi, . . . , Wn) vary over A„, the simplex 
of positive probability vectors convexly spanned by the unit coordinate vectors. We 
define a weighted geometric mean of n positive definite matrices to be a map g : 
A„ X P*^ — )■ P satisfying the following properties: 



(PI 
(P2 
(P3 
(P4 
(P5 
(P6 



(Consistency with scalars) g{uj\ A) = A^^ ■ ■ ■ A^" if the Aj's commute; 
(Joint homogeneity) g{co; aiAi, . . . , a„A„) = a^^ ■ ■ ■ a'^"g{ijj; A); 
(Permutation invariance) g{uja-] A^-) = g{uj; A), where Ua- = {wa-{i), • • • , ti'o-(n)); 
(Monotonicity) If Bi < Ai for all 1 < i < ra, then g{uj;M) < g{uj; A); 
(Continuity) The map g{u}; ■) is continuous; 
(Congruence invariance) ^'(0;; M*AM) = M*g{u; A)M for any invertible M; 
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(P7) (Joint concavity) ^(u;;AA+(l-A)B) > Xg{uj; A) + {l-X)g{u;M) ior < A < 1; 

(P8) (Self-duality) g{oo; Ai\ . . . , A'^y' = g{uj; A,,..., A„); 

(P9) (Determinental identity) Det^(w; A) = nr=i(DetAi)"'- and 
(PIO) (AGH weighted mean inequalities) (X]r=i "^jA^^)^^ — di^'i-^) ^ 'I27=i'^i^i- 
We note that the two- variable weighted geometric mean 02(1— ^5 1; A, B) = A^tB, t G 
[0,1], satisfies (PI) - (PIO). 

In their study of the symmetric least squares mean, Moakher [2S| and Bhatia and 
Holbrook [7j, [8] have derived for it some of the axiomatic properties (Pl)-(PIO) sat- 
isfied by the Ando-Li-Mathias geometric mean: consistency with scalars, joint homo- 
geneity, permutation invariance, congruence invariance, and self-duality (the last two 
being true since congruence transformations and inversion are isometrics). Further, 
based on computational experimentation, Bhatia and Holbrook conjectured mono- 
tonicity for the least squares mean (problem 19 in "Open problems in matrix theory" 
by X. Zhan [3T]). Providing a positive solution (Corollary 16. 4p to this conjecture was 
the original motivation for this paper. 

In this paper we introduce the weighted least squares mean &n{^] Ai, . . . , A^) of 
(y4i, . . . , An) with the weight u = {wi, . . . , Wn) G A„, which is defined to be 

n 

(1.1) (5niuj; Ai, . . . , An) = argminVwi52(X,A,). 

Computing appropriate derivatives as in ([SI [25]) yields that the weighted least squares 
mean coincides with the unique positive definite solution of the equation 

n 

(1.2) J]w;aog(XV) = 0. 

1=1 

It is not difficult to see from (1 1.11) and (11.21) and some elementary facts about ma- 
trices and the trace metric that the weighted least squares mean satisfies (PI) — 
(P3), (P6), (P8) and (P9). In this paper we show that the weighted least squares 
mean satisfies all the properties (PI) — (PIO) by verifying all the additional proper- 
ties (P4), (P5), (P7), and (PIO). As far as we know, this is the first verification of 
properties (P4) and (P7) in both the weighted and unweighted cases and of (PIO) in 
the weighted case, the unweighted case having been shown by Yamazaki ([2^]). We 
thus see that the (weighted) least squares mean provides another important example 
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of a (weighted) geometric mean. We further show that the weighted least squares 
mean is non-expansive: 6{(Sn{(^] ^i, • • • , ^n), <Sn(c^; Bi, . . . , Bn)) < XlILi ^i^i^i^ Bi). 
The main tools of the paper involve the theory of nonpositively curved metric 
spaces and techniques from probability and random variable theory and the recent 
combination of the two, particularly by K.-T. Sturm [28]. Not only are these tools 
crucial for our developments, but also, we believe, significantly enhance the potential 
usefulness of the least squares mean. 



2. Metric Spaces and Means 

The setting appropriate for our considerations is that of globally nonpositively 
curved metric spaces, which we call NPC spaces spaces for short (since we do not 
consider the locally nonpositively curved spaces). These are complete metric spaces 
M satisfying for all x, y G M, there exists m & M such that for all 2 G M 

(2.3) d\m, z) < -d\x, z) + -d\y, z) - -d\x, y). 

Such spaces are also called (global) CATo-spaces or Hadamard spaces. The theory 
of such spaces is quite extensive; see, e.g., [3], [10], [15], [28]. In particular the m 
appearing in (12. Sp is the unique metric midpoint between x and y. By inductively 
choosing midpoints for dyadic rationals and extending by continuity, one obtains for 
each X ^ y a. unique metric minimal geodesic 7 : [0, 1] — )■ M satisfying (i(7(t), 7(3)) = 
\t — s\d{x,y). We denote 7(t) by x^tV and call it the t-weighted mean of x and 
y. The midpoint a;#i/2l/ we denote simply as xj^y. We remark that by uniqueness 
xi^tV = yihi-tX] in particular, x#y = yjj^x. 

Remark 2.1. Equation (12. 3p is sometimes referred to as the semiparallelogram law, 
since it can derived from the parallelogram law in Hilbert spaces by replacing the 
equality with an inequality (see [I9]). It is satisfied by the length metric in any 
simply connected nonpositively curved Riemannian manifold [IB]. Hence the metric 
definition represents a metric generalization of nonpositive curvature. The trace met- 
ric on the Riemannian symmetric space of positive definite matrices is a particular 
example ([IS [19]). 
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Equation f l2.3p admits a more general formulation in terms of the weighted mean 
(see e.g. pSj Proposition 2.3]). For all < t < 1 we have 

(2.4) d\xi^ty, z)<{l- t)d\x, z) + tS{y, z) - t(l - t)d^{x, y). 

An n-mean on a set X is a function /i : X" — )■ X satisfying the idempotency 
law yu(x, x, . . . ,x) = X. It is symmetric if it is invariant under all permutations a 
of {!,...,«}, i.e., /i(xi, . . . ,a;„) = /i(a;o-(i), . . . ,Xo-(„)). For a metric space X with 
weighted mean, the operation xjj^ty is a 2-mean for each t. A special case is the 
midpoint mean x^y for t = 1/2, which is symmetric. 

The problem of extending the geometric mean of two positive definite matrices to 
an n- variable mean for n > 3 generalizes to the setting of metric spaces with weighted 
means. Under appropriate metric hypotheses, all of which are implied by the NPC 
condition, the symmetrization procedure applies and inductively yields multivariable 
means extending xj^y for each n > 3; see Es-Sahib and Heinrich [11] and the authors 
PU] . The weighted 2- means and the mean of Bini, Meini, and Poloni [3] also generalize 
to NPC-spaces, and even weaker metric settings [21] . 

The least squares mean can be immediately formulated in any metric space (M, d): 



(2.5) 0„(ai,...,a„) = argmin^rf 



'^[z,ai) 



In general this mean is not defined, since the minimizer may fail to exist or fail to be 
unique. One also has a weighted version of the mean. Given (ai, . . . , a„) G M", and 
positive real numbers w\^ . . . ^Wn summing to 1, we define 

n 

(2.6) (Sn(wi,. • • ,Wn;ai,- • •,an) := wigmmS^ Wi(i^{z,ai). 

provided the minimizer exists and is unique. As mentioned previously, it was shown 
by E. Cartan (see [5]) that this is the case if the points all lie in a convex ball in a 
Riemannian manifold. For our purposes, we note that existence and uniqueness holds 
in general for NPC spaces as can be readily deduced from the uniform convexity of 
the metric; see [23, Propositions 1.7, 4.3]. Note that the mean in equation (12.51) is 
a symmetric mean and the one in (12.61) is permutation invariant in the sense of of 
property (P3) for weighted means given in the Introduction. By taking Wi = 1/n 
for each i = 1, . . . , n, we see that the former mean is a special case of the latter, so 
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we work with the weighted case in what follows. Although this mean is sometimes 
referred to as the Karcher mean in light of its appearance in his work on Riemannian 
manifolds [12] , we will refer to it as the weighted least squares mean, or simply as the 
least squares mean. 



3. The method 

Since our method of proof in this paper departs rather radically from previous 
approaches to the theory of matrix means, we judge that it is worthwhile to give 
a quick, informal, and intuitive overview of our approach and methods. Suppose 
that we are given an NPC metric space {M,d), a tuple (ai,...,a„) G M", and a 
weight {wi, . . . , Wn) of positive real numbers. We imagine carrying out a sequence of 
independent trials in which we randomly choose in each trial an integer from the set 
{1, . . . ,n} in such a way that i is chosen with probability Wi. If ik is chosen on the 
A;*''-trial, then we set x^ = ai^. We define a "random walk" {s^} using this data by 
setting si = xi, S2 = si#X2, S3 = S2#i/3a;3, and in general Sk = Sk-ii^i/kXk, that 
is, at stage k we move from Sk-i toward Xk a fraction of 1/k of the distance between 
them. It is then a remarkable consequence of Sturm's Theorem 4.7 of p8j that as 
we run through all possible outcomes of this procedure, almost always the sequence 
{sfcjfceN will converge to (S„(wi, . . . , w„; Oi, . . . , a„), the weighted least squares mean. 

This machinery provides a powerful tool for the study of the least squares mean. 
Many properties of the weighted 2-means can be shown to extend to their finite 
iterations s„, as defined in the previous paragraph, and then shown to be preserved 
in passing to the limit, the weighted least squares mean. It could also potentially 
be a useful computational tool to approximate the weighted least squares mean by 
simulating the preceding random walk up through some stage s„ for large enough n. 

For an fc-tuple [xi, . . . ,Xk) € M^, we can compute Sk as defined in the first para- 
graph and use this value to define a mean Sk{xi, . . . ,Xk) = Sk- In [2S] Sturm has 
called this mean the inductive mean for NPC spaces, a mean which appeared earlier 
in [271 E] fo^^ positive definite matrices. Its explicit definition is given inductively by 
5'2(x, y) = x#y and for k >3, Sk{xi, . . . ,Xk) = Sk-iixi, . . . , Xfc-i)# ix^. 
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4. Random variables and barycenters 

In recent years significant portions of tlie classical theory of real-valued random 
variables on a probability space have been successfully generalized to the setting in 
which the random variables take values in a metric space M. We quickly recall 
some of this theory as worked out, for example, by Es-Sahib and Heinrich p!T] and 
particularly by Sturm [28]. 

Let (f2. A, a) be a probability space: a set Vt equipped with a a-algebra A of subsets, 
and a a-additive probability measure a on A. We typically write the measure or 
probability of A G ^ by P{A) instead of cr(y4). For a metric space {M,d), an M- 
valued random variable is a function X : i7 — )■ M which is measurable in the sense 
that X^^{B) G A for every Borel subset of M. We further impose the technically 
useful assumption that the image XiVt) is a separable subspace of M. 

The push-forward of the measure a by X is denoted qx and defined by qx{B) = 
a{X~^{B)) for each Borel subset B of M. It is a probability measure on the Borel 
sets of M and is called the distribution of X. A sequence of random variables {X„} is 
identically distributed (i.d.) if all have the same distribution. For any gx-integrable 
function : M — )■ M, one has the basic formula J^^- </> dqx = j^ <pX da. 

A collection of random variables {Xi : i G /} is independent if for every finite F ^ I 
of cardinality at least two, P{f].^pXr^{Bi)) = H-^^ P{Xr^{Bi)), where {Bi : i e 1} 
is a collection of Borel subsets of M. A sequence {Xn} is i.i.d. if it is both independent 
and identically distributed. 

Assume henceforth that M is an NPC-space. Let V{M) denote the set of proba- 
bility measures with separable support on {M,B{M)), where B{M) is the collection 
of Borel sets. We define the collection V^{M) of probability measures q G V{M) 
to be those satisfying Jj^^d^{z,x)q{dx) < oo for all z G M. Members of V^{M) are 
called integrable and those in V'^{M) are called square integrable. We define a random 
variable X : i7 — )■ M to be in L^ if its distribution qx G V^{M). In particular, it 
is integrable if ^^d{z^X{uj))a{duj) = fj^jd{z,x)qx{dx) < oo for 2; G M. We define 
a sequence {X„} of random variables to be uniformly bounded if there exists z & M 
and -R > such that the image of each X„ lies within the ball around z of radius R. 
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Following Sturm [28j, we define the barycenter b{q) of g G V^{M) by 

(4.7) 6(g) = argmin / [cPiz^x) — d^iy^xYlqidx). 

Sturm uses the uniform convexity of the metric to show that independent of y there is 
a unique z = b{q), the barycenter (by definition), at which this minimum is obtained 
[28t Proposition 4.3], and that for the case that q is square integrable the barycenter 
can be alternatively characterized by 



(4.8) b{q) = argmin / d {z,x)q{dx). 

^ Jm 

Remark 4.1. For the case that q = J2^=i "^i^xi, where {wi, . . . , Wn) is a weight and 
6r^. is the point mass at Xi, we have 



Wi, 



» n 

b{q) = arginf / d^{z,x)q{dx) = arginf N^ i(7j(i^(z, Xj) = 0.„( 

Thus in this case q is square integrable and its barycenter b{q) agrees with the weighted 
least squares mean of (xi, . . . , x„). 

For X : fi — 7- M integrable, we define its expected value EX by 

(4.9) EX = arginf [ \d^{z,X{uj)) - d'^{y,X{co))]a{duj) 

= argmin / [d"^ {z , x) — d"^ {y ^ x)]qx (dx) = b{qx) ■ 
^ Jm 

From this definition it is clear that integrable i.d. random variables have the same 
expectation. 

It is also possible to define and prove notions of a Law of Large Numbers for a 
sequence of i.i.d. random variables into a metric space M. Let {X„ : n G N} be a 
sequence of independent, identically distributed random variables on some probability 
space [Q, A, a) into M. Let fin be an n-mean on M for each n, for example one 
obtained by the symmetrization procedure or least squares. We use these means to 
form the "average" Yn of the given random variables according to the rule Yn{u)) : = 
fin{Xi{u), . . . ,Xn{uj)). Now uudcr suitable hypotheses Es-Sahib and Heinrich [TT] 
and Sturm [28] show that a strong law of large numbers is satisfied, that is, the Yn 
converge pointwise a.e. to a common point b. The principal result of Sturm 
Theorem 4.7] is crucial for our purposes. 
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Theorem 4.2. Let {Xn}ne'M be a sequence of uniformly bounded i.i.d. random vari- 
ables from a probability space {Q,A,cr) into an NPC space M. Let Sn denote the 
inductive mean for each n > 2, and set F„(u;) = S'„(Xi(a;), . . . ,X„(co')). Then 
Yn{co) — )■ EXi as n —> CO for almost all u E Q. 

5. A BASIC CONSTRUCTION 

In this section we speciahze Theorem 14.21 to the case of finitely supported prob- 
abihty measures, the case of interest to us. We first recall a standard construction 
of probability theory. For each fc G N, let Qk denote a copy of an ra-element set 
labelled {^i, . . . ,^n} equipped with the probability measure cr^ = X^iLi'^^i^' where 
{wi, . . . , Wn) is a weight. We set Q = Y[T=i ^fc- ^^ '^^^^ ^ subset oi Q a. box if it is 
of the form A = Y[T=i ^fc' where 7^ ^4^ C fi^ and set o'{A) = Y[T=i '^kiA^). It is a 
standard result of measure theory that a uniquely extends to a probability measure, 
called the product measure and again denoted a, on the a-algebra A generated by 
the boxes. 

Now let (M, d) be an NPC space, let {xi, . . . , Xn} C M, and let w = {wi, . . . , w„) 
be a weight. For each positive integer k, let X^. : fi — )■ M be defined by Xk{uj) = 
Xi if Tikioo) = $,i, where vr^ : fi — > f2fc is projection into the fc*''-coordinate. It is 
straightforward to verify from the definition of the product measure that the sequence 
{Xk} of random variables is independent. Furthermore, each Xk has distribution 
Sr=i'^«'^^i' ^^^ hence the sequences are identically distributed. 

We define Y^ : ^ — ^ ^ for each k by Yk{uj) = Sk{Xi{u), . . . ,Xk{uj)), where 5*^ is 
the inductive mean. By Theorem 14.21 we have that lirak^ooYkiuj) = EXi = b{qx^) 
a.e. From Remark WTl it follows that that limfc_j.oo Yki^) = ^n(w; xi, . . . , x„) a.e. We 
summarize this special case of Theorem 14. 2[ 



Corollary 5.1. Let {M,d) be an NPC space, let {xi,...,x„} C M, and let w = 

{wi, . . . , Wn) be a weight. Then limfc^^oo Yki^) = ®n(w; xi, . . . , Xn) a.e. for the {Yk} 
given in the preceding construction. 

We consider a basic example. 

Proposition 5.2. Let Ti be a Hilbert space endowed with the metric induced by the 
inner product. Then 
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(i) "H is an NPC space. 

(ii) The binary t -weighted mean of x and y is given by {1 ~ t)x + ty. 
(iii) The inductive mean is given by Sn{xi, . . . ,x„,) = Y17=ii^/''^)^i- 
(iv) The weighted least squares mean for weight w = {wi, . . . ,Wn) is given by 

(S„(w; Xi, . . . , Xn) = ^^{=1 Wi^i- 

(v) For {XfcjfcgN and a weight w = {wi, . . . ,Wn) as given in the preceding con- 
struction, we have a.e. 

k n 

lim y^{l/k)Xi{uj) -> ©„(w; Xi, . . . , a;„) = V" WiXi. 

i=\ i=\ 

Proof, (i) It is standard that Hilbert spaces satisfy the parallelogram law, hence the 
semiparallelogram law f l2.3p . and hence are NPC spaces (see e.g. [28, Proposition 
3.5]). 

(ii) The map on [0, 1] given by t i— )■ (1 — t)x + ty is a. metric geodesic taking 
to X and 1 to y. Since such geodesies are unique in NPC spaces, it must give the 
t-weighted mean. 

(iii) By definition and induction 

n-1 1 _ n- 1 ^ 1 1 _ ■^ 1 

'^n\Xly ■ ■ ■ 1 Xn) '-'n— ll^^l) • • • ) Xn—l)~\ X^ > —Xin Xn / Xi- 

n n n ^ — ^ n — 1 n ^-^ n 

1=1 i=l 

(iv) Consider the measure q = Yl^=i '^i^Xi- Then for any y ^Ti, 



P n In 

/ {x,y)qidx) = y^Wi{xi,y) = ( V 
-^w i=i \ i=i 



((5„(w;xi,...,x„),|/) = (6(g),y) = / (x,y)g(c/x) = V Wi(xi,|/) = ( V w^Xi,?/ 

-^w i=i \ i=i 

where the first equality follows from Remark 14.11 and the second is the content of 
Proposition 5.4]. The conclusion of (iv) is now immediate. 

(v) In the earlier construction of this section we have Yi^{uj) = X]j=i ^^i('^) by part 
(iii). The conclusion of (v) then follows from Corollary 15. II and (iv). D 

6. MONOTONICITY AND LOEWNER-HEINZ NPC SPACES 

The fundamental Loewner-Heinz inequality for positive definite matrices asserts 
that A^'"^ < B^'"^ whenever A < B. This can be written alternatively as A^I < B^I 
whenever A < B and extends to the equivalent monotonicity property that Ai^A2 < 
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Bijj^B2 whenever Ai < Bi and A2 < B2. These considerations motivate the next 
definition. 

Definition 6.1. A Loewner- Heinz NPC space is an NPC space equipped with a closed 
partial order < satisfying Xi#X2 < yi#y2 whenever Xj < yi for i = 1, 2. 

A mean /i : M " — )■ M on a partially ordered metric space is called order-preserving 
or monotonic if Xj < i/j for i = 1, . . . , n implies fi{xi, . . . , x„) < fi{yi, . . . , y„). 

Lemma 6.2. T/ie inductive mean Sn on a Loewner- Heinz NPC space is monotonic 
for every n > 2. 

Proof. We first observe that Xi4^tX2 < Z/i#tl/2 whenever Xi < y^ for i = 1,2 by 
the standard argument of extending the inequality to the dyadic weighted means by 
induction for the case of the dyadic rationals, and then extending to general t G [0, 1] 
by continuity in t and the closedness of the relation <. Assuming that the inductive 
k-m.ea.ia Sk is monotonic, it follows that Sk+i{xi, . . . , x^+i) = Sk{xi, . . . , a;fc)#^_a;fc+i 

fc + 1 

is monotonic since Sk and the t-weighted mean both are. D 

Theorem 6.3. Let (M, d, <) be a Loewner-Heinz NPC space. Then for a fixed weight 
w = {wi, . . . ,Wn) the weighted least squares mean 0^ is monotonic for n >2. 



nji 



Proof Assuming Xi < yiioi 1 < i < n, we show 0„(w; Xi, . . . , a;„) < 0n(w; yi, . . . ,y, 
where l5„ is the least squares mean on M". Let Qk be a copy of the n-element set 
{,^i,...,^„} equipped with the measure X]r=i^«%- ^^^ ^ ~ Ilfcli ^fc be the count- 
able product of the Q^ with the product measure. Let X^ : fi — t- M be defined by 
Xk{uj) = Xi if TTki^) = ^i, where vTfc : i7 — ;■ fi^ is projection into the fc^'^-coordinate. 
Similarly we define Xk : fi — )■ M by Xk{u) = yi if 71^(0;) = C,i- As we have seen in the 
previous section {Xk} is i.i.d. with distribution Y17=i'^i^x^, while {Xk} is i.i.d. with 
distribution X^ILi ""^i^y*- Finally we note that {Xi{uj), . . . ,Xk{ijj)) is coordinatewise 
less than or equal to {Xi{u), . . . , Xki^u)) since Xi < yi for each i = 1, . . . ,n. 
We define n, Yk : n ^ M by Yk{uj) = Sk{X,{uj), . . . , Xk{uj)) and Yk{uj) = 
Sk{Xi{u), . . . ,Xk{uj)). It follows from Lemma |6^ that Yk{uj) < Yk{uj) for each 
u & Q. By Corollary 15.11 we have that limfc^oo ^fc = 0n(w; Xi, . . . ,x„) a.e. and 
limfc_5.oo ^A: = 0n(w;yi, . . . ,yn) a.e. By the closedness of the partial order (and the 
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fact that the intersection of two sets of measure 1 still has measure 1), we conclude 
that (S„(w; Xi, . . . , x„) < (S„(w; yi, . . . , ?/„). D 

Since the trace metric on the space P of m x m positive definite (real or complex) 
matrices makes it a Loewner-Heinz NPC space with respect to the Loewner order 
(see e.g. [19]), we have the following corollary. 

Corollary 6.4. The weighted least squares mean on the set P of positive definite 
matrices is monotonic. 



Remark 6.5. Loewner [2l] proved that a function defined on an open interval is 
operator monotone if and only if it allows an analytic continuation into the complex 
upper half-plane with nonnegative imaginary part. The function /(t) = t", a G [0, 1] 
is operator monotone on the positive reals, that is, X < Y implies X" < Y"" for 
positive definite matrices X and Y. The inequality was independently proved by 
Heinz [H]. It is equivalent to the extended monotonicity property of the weighted 
geometric mean: Bijj^tB2 < Ai#iA2,t G [0, 1], whenever Bi < A\ and B2 < A2. It is 
natural to consider the monotonicity of the least squares mean 0n(i^; K) < ^n(i^; A) 
whenever Bi < Ai for each i as an n-variable Loewner-Heinz inequality for positive 
definite matrices. 

A function F : P*^ — )■ P is jointly concave if for any {Ai, . . . , An), {Bi, . . . , Bn) E P" 
and < t < 1, we have 

tF{A,, . . . , A„) + (1 - t)F{B,, ...,Bn)< F{tA, + (1 - t)B,, . . . , tA„ + (1 - t)5„). 

Proposition 6.6. The least squares mean C5„ : P" — )■ P for the trace metric is jointly 
concave for each n > 2. 

Proof. It is a standard result that the two-variable weighted geometric mean on P is 
jointly concave. It follows directly by induction that the inductive mean Sn of positive 
definite matrices is jointly concave for n > 2. 

Fix {Ai, . . . , An), {Bi, . . . , Bn) G P" and a weight w = {wi, . . . , w„). Construct 
random variables {Xk}, {Xk} as in the proof of Theorem 16.31 with Ai replacing Xi 
and Bi replacing yi for each i. For Y^ = Sk{Xi, . . . , X^) and Y^ = Sk{Xi, . . . , Xk}, 
we conclude from the concavity of Sk that 

tY, + {l-t)n < SkitX, + {l~t)X,,...,tXk + {l-t)Xk) = Sk{Z,,...,Zk), 
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where Zi = tXi + (1 — t)Xi foi 1 < i < k. Note that the Zk are i.i.d. with each Zk 
having distribution the probabihty measure that assigns mass Wi to each tAi+{l—t)Bi, 
1 < i < n. From Corollary 15. II the limit of both sides exists a.e. and is given by the 
appropriate least squares mean, and from the closedness of the order we conclude 

t&n{w; A, . . . , A„) + (1 - t)0„(w; Bi, . . . , 5J < 0„(w; Zi, . . . , Z„), 

where Z^ = tAi + (1 — t)Bi for each i. D 

7. Other properties of the least squares mean 

The fact that the unweighted least squares mean is bounded above by the arithmetic 
mean, and hence below by the harmonic mean has been recently shown by Yamazaki 
[29] . We give an alternative approach via probabilistic methods and derive the result 
for the weighted least squares mean. 

Proposition 7.1. For (Ai, . . . , An) G P" and a weight w = (wi, . . . , Wn), we have 

1 \ -1 n 

^w.AtA <0„(w;Ai,...,yl„) <^WiAi. 
j=i / j=i 

Proof. It is a standard result that the two-variable weighted geometric mean on P 

is below the corresponding weighted arithmetic mean: A^tB < {1 — t)A + tB for 

< t < 1. It follows by induction that the inductive mean satisfies for each k 

Sk{Bi, . . . ,Bk) = Sk^i{Bi, . . . ,Bk-i)i^i/kBk 

A;-!^ 1 1 1 A 

< > B, + -Bk = -y B,. 

k ^k-l ' k '^ k^ 

Construct a sequence of i.i.d. random variables {X^} as in Section 5 such that the 
distribution is X^iLi ^«^^j f*^^ each X^. Set Y^ = S'fc(Xi, . . . , X^) and for each k. From 
Corollary [O limfc^oo ^A:(w) = (S„(w; Ai, . . . , A„) a.e. 

Endow the space of Hermitian matrices EI containing P with the Hilbert space 
structure with inner product (A, B) = tTA*B. Then P is an open subspace of H. 
Set Zk = Yli=ii^/^)-^i^ where {X^} are the random variables of the previous para- 
graph. By Proposition l5.2B (v). limfc_i.oo Zfc(a;) — )■ J2^=i'^i^i ^-^^ ^J ^^e first para- 
graph Yk{u) < Zkiu) for all k^u. From the closure of the order, we conclude that 
&n{^-A^,...,An)<Y.t^WiA,. 
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The first inequality in the conclusion of the proposition follows from the second 
and the fact that inversion is an isometry for the trace metric and hence preserves 
the least squares mean. D 

Let M be an NPC space. Given probability measures p,q & V{M), we say that a 
probability measure fi G V{M'^) is a coupling of p and q if the marginals of /i are p 
and q, that is, if for all Borel sets B G B{M) 

(7.10) fi{B X M)= p{B) and /i(M x 5) = q{B). 

Definition 7.2. The {V-)- distance p on V^{M) is given by 

W{Pi q) = inf < / d{x, y)fi{dxdy) : /i is a coupling of p and q 

{Jmxm 

We adopt the most common name for the metric, the Wasserstein distance, although 
it also appears under a variety of other names such as the Kantorovich-Ruhenstein 
distance. 

Proposition 7.3. For (xi, . . . , a;„), (yi, . . . , Hn) G M", a weight w = {wi, . . . , m„), 
and the corresponding finitely supported probability measures gi = X]fc=i '^i^xi o-nd 

n 

d{(5n{w; Xi, . . . , Xn), <5n(w; yi, . . . , Vn)) < W{qi, gs) < "^ Wid{xi, yi). 

i=l 

Hence, in particular, the least squares mean 0„ is continuous for each w. 

Proof. Define /i on M x M by /i = Yl'i=i'^i^(xi,yi)- One sees readily that yU is a 
coupling of p and q, and thus W{p,q) < f^j^^jd{x,y)fi{dxdy) = Y^^=i'^id{xi,yi) . By 
Theorem 6.3 of [28j, the barycentric map h : V^{M) — > M satisfies for all p,q the 
fundamental contraction property d{b{p),b{q)) < W{p,q). By Remark |1]T] 6 (gi) = 
(S„(w; Xi, . . . , Xn) and similarly 6(^2) = <S„(w; yi, . . . , ?/„) . Thus 

n 

d{0n{w,xi, . . . ,Xn),&n{w,yi, ■ ■ ■ ,yn)) = d{b{qi),b{q2)) < W{qi,q2) < y^^Wid{xi,yi). 

i=l 

The fact that the right hand of the preceding is larger than the left hand directly 
establishes the continuity of (S„. D 
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From this result together with Corollary 16.41 and Propositions 16.61 and 17.11 we con- 
clude that the least squares mean of positive definite matrices satisfies the continuity, 
monotonicity, joint concavity, and AGM inequality properties, and hence all the fun- 
damental properties of the geometric means of positive definite matrices defined for 
and satisfied by the ALM and BMP constructions [21 [9]; see [29] for other properties. 

8. Appendix: The least squares mean on symmetric cones 

In this section, we shall see that the techniques and results from the probabilistic 
treatment of the least squares mean for positive definite matrices carry over, typically 
with little change, to the case of symmetric cones. We first briefiy describe (following 
mostly [12]) some Jordan- algebraic concepts pertinent to our purpose. A Jordan 
algebra V over R is a finite-dimensional commutative algebra with identity e satisfying 
x^{xy) = x{x'^y) for all x,y & V. For x G V, let L{x) be the linear operator defined 
by L{x)y = xy, and let P{x) = 2L(x)^ — L(x^). The map P is called the quadratic 
representation of V. An element x G \^ is said to be invertible if there exists an 
element x~^ in the subalgebra generated by x and e such that xx~^ = e. 

An element c G V^ is called an idempotent if c^ = c. We say that Ci, . . . , c^ is a 
complete system of orthogonal idempotents if cf = Ci, CiCj = 0, i 7^ j, Ci + ■ — \-Ck = e. 
An idempotent is primitive if it is non-zero and cannot be written as the sum of two 
non-zero idempotents. A Jordan frame is a complete system of primitive idempotents. 

A Jordan algebra V is said to be Euclidean if there exists an inner product (■, ■) 
such that for all x,y,z E V : 

(8.11) {xy,z) = {y,xz). 

The following spectral theorem for Euclidean Jordan algebras appears in [12j. 

Theorem 8.1. Any two Jordan frames in an Euclidean Jordan algebra V have the 
same number of elements {called the rank ofV, denoted ranklV)) . Given x G V, there 
exists a Jordan frame Ci, . . . ,Cr and real numbers Ai, . . . , A^ such that 



X = 

i=l 



E^- 



Definition 8.2. Let V^ be a Euclidean Jordan algebra of rank(y) = r. The spectral 
mapping A : V — ;■ M'' is defined by A(x) = {\i{x) , . . . , Xr{x)) , where the Aj(x)'s 
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are eigenvalues of x (with multiplicities) as in Theorem 18.11 in non-increasing order 
Amax(a;) = Ai(x) > A2(a;) > ■ ■ ■ > Xr{x) = Xmm{x). 

We define det(x) = ni=i Ai(a;) and tr(x) = X]I=i Ai(a;). Then tr is a linear form on 
V and det is a homogeneous polynomial of degree r on V. 

The trace inner product (x, y) = tr(x?/) in a Euclidean Jordan algebra satisfies 
(18. lip . We will assume that Visa Euclidean Jordan algebra of rank r and equipped 
with the trace inner product (x, y) = tr^xy). Let Q be the set of all square elements of 
V. Then Q is a closed convex cone of V with Q fl —Q = {0}, and is the set of elements 
X & V such that L{x) is positive semi-definite. It turns out that Q has non-empty 
interior Q, and fi is a symmetric cone, that is, the group G{Q) = {g E GL{V)\g{Q) = 
Q} acts transitively on it and fi is a self-dual cone with respect to the inner product 
(■|-). Furthermore, for any a in Q, -P(a) G G{Q) and is positive definite. We note that 
any symmetric cone (self-dual, homogeneous open convex cone) can be realized as an 
interior of squares in an appropriate Euclidean Jordan algebra [12] . 

We remark that det(P(a^/^)6) = det(a)det(6) for all a,b E il and tr(loga) = 
logdet(a) for all a E ^, where a^/^ = Y7i=i\ ^ii logct = X]I=i(log Aj)cj, and a = 
X]I=i AiQ a spectral decomposition of a ([12]). 

Proposition 8.3. The symmetric cone Vt EV has the following properties: 

VL = {x^ : X is invertible} = {x : L{x) is positive definite} = {x : Amm(a^) > 0}. 

The space Mm oi mxm Hermitian matrices equipped with the trace inner product 
(X, Y) = ti{X*Y) and the Jordan product XoY = j(XY + YX) is a typical example 
of Euclidean Jordan algebras. In this case the corresponding symmetric cone is F^,, 
the convex cone oi m x m positive definite Hermitian matrices, and the quadratic 
representation is given by P{X)Y = XYX. 

It turns out [12] that the symmetric cone Q admits a G(r2)-invariant Riemannian 
metric defined by {u,v)x = {P{x)~'^u,v),x E Q,u,v E V. The inversion j{x) = x~^ is 
an involutive isometry fixing e. It is a symmetric Riemannian space of non-compact 
type and hence is an NPC space with respect to its distance metric [IHl [IS]. The 
isometry properties just mentioned give symmetric cone analogs of properties (P6) 
and (P8) for the weighted least squares mean. Permutation invariance (P3) of the 
least squares mean holds in any metric space. 
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The unique geodesic curve joining a and 6 is t i— ?■ a^tb := P(a^/^)(P(a^^/^)6)* and 
the Riemannian distance d{a,b) is given by d{a,b) = (X]I=i l^S^ '^i(-P('^^^^^)^)) 
See [m [191 122] for more details. The geodesic middle (geometric mean) of a and b is 
given by a#6 := a#i/26 = P{a^/^){P{a-^^^)bY/\ 

In |22l [23], it is shown that the geometric mean is monotone for the cone ordering, 
X < y ii and only ii y — x & Q, and therefore we conclude that every symmetric cone 
is a Loewner- Heinz NPC space. By Theorem 16.31 and Proposition 17. 3[ the weighted 
least squares mean on a symmetric cone is monotonic and non-expansive, in particular 
continuous. Hence properties (P4) and (P5) are satisfied. 

The AGH-inequality for the two- variable case can be reduced to the case of two ele- 
ments sharing a "diagonalization" over some Jordan frame, in which case the inequal- 
ity follows from the real number case. The proof of Proposition 17.11 then establishes 
(PIO) for the general weighted least squares mean. 

We say that two elements a and b commute if they share the same Jordan frame. 
Then (PI) follows easily for the least squares mean. 

The properties (P2) and (P9) follow from the characterization of the least squares 
mean as the unique member of fi satisfying XlILi '"^J^°s(-^(-^^''^)'^i~^) ~ ^ (which fol- 
lows from a standard method for computing the Hessian operator of the distance func- 
tion on Riemannian manifolds), li z = Q3n(w; xi, ..., x„), then ^"^j^ ifj log P(2;^/^)a;j = 
0. Setting y = af ^ ■ ■ ■ a'^"z we have 



n n T-rn Wi 

J2^,\ogiPiy'/')ia,x,)-') = J2w,\og^^^^^^^Piz'/')x-' 

J2 ^'^ ( log ili^i^e + log Piz'/')x-' 



1=1 1=1 



i=l 
" T-rn Wi 



where the second equality follows from the fact that logta = logte-|-loga for any t > 
and a E Q. This establishes (P2). The determinantal identity follows by considering 
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the trace functional: 

(n \ n 

j=i / j=i 

n n 

= ^Wi\ogdei{P{z^/'^)a^^) = ^Wilogdet{z)det{a-^) 

i=l i=l 

n n n 

= Y^ Wjlogdet(2;) — N^ Wjlogdet(aj) = logdet(2;) — logTTdet(aj)"'\ 

i=l i=l i=l 

The joint concavity (P7) of the two-variable geometric mean is as yet unknown for 
general symmetric cones (unlike the positive definite matrix case) and hence we do 
not yet have the joint concavity property for the weighted least squares mean. 
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